Four types of context for automatic spelling correction
نویسنده
چکیده
This paper presents an investigation on using four types of contextual information for improving the accuracy of automatic correction of single-token non-word misspellings. The task is framed as contextually-informed re-ranking of correction candidates. Immediate local context is captured by word n-grams statistics from a Web-scale language model. The second approach measures how well a candidate correction fits in the semantic fabric of the local lexical neighborhood, using a very large Distributional Semantic Model. In the third approach, recognizing a misspelling as an instance of a recurring word can be useful for reranking. The fourth approach looks at context beyond the text itself. If the approximate topic can be known in advance, spelling correction can be biased towards the topic. Effectiveness of proposed methods is demonstrated with an annotated corpus of 3,000 student essays from international high-stakes English language assessments. The paper also describes an implemented system that achieves high accuracy on this task. RÉSUMÉ. Cet article présente une enquête sur l’utilisation de quatre types d’informations contextuelles pour améliorer la précision de la correction automatique de fautes d’orthographe de mots seuls. La tâche est présentée comme un reclassement contextuellement informé. Le contexte local immédiat, capturé par statistique de mot n-grammes est modélisé à partir d’un modèle de langage à l’échelle du Web. La deuxième méthode consiste à mesurer à quel point une correction s’inscrit dans le tissu sémantique local, en utilisant un très grand modèle sémantique distributionnel. La troisième approche reconnaissant une faute d’orthographe comme une instance d’un mot récurrent peut être utile pour le reclassement. La quatrième approche s’attache au contexte au-delà du texte lui-même. Si le sujet approximatif peut être connu à l’avance, la correction orthographique peut être biaisée par rapport au sujet. L’efficacité des méthodes proposées est démontrée avec un corpus annoté de 3 000 travaux d’étudiants des évaluations internationales de langue anglaise. Le document décrit également un système mis en place qui permet d’obtenir une grande précision sur cette tâche.
منابع مشابه
ارائه یک رتبهبند برای خطایاب معنایی با استفاده از ویژگیهای حساس به متن
Nowadays, a large volume of documents is generated daily. These documents generated by different persons, thus, the documents contain spelling errors. These spelling errors cause quality of the documents are decrease. Therefore, existence of automatic writing assistance tools such as spell checker/corrector can help to improve their quality. Context-sensitive are misspelled words that have been...
متن کاملSpelling and Grammar Correction for Danish in SCARRIE
This paper reports on work carried out to develop a spelling and grammar corrector for Dan-ish, addressing in particular the issue of how a form of shallow parsing is combined with error detection and correction for the treatment of context-dependent spelling errors. The syntactic grammar for Danish used by the system has been developed with the aim of dealing with the most frequent error types...
متن کاملTowards Context-Dependent Phonetic Spelling Error Correction in Children's Freely Composed Text for Diagnostic and Pedagogical Purposes
Reading and writing are core competencies of any society. In Germany, international and national comparative studies such as PISA or IGLU have shown that around 25% of German school children do not reach the minimal competence level necessary to function effectively in society by the age of 15. Automized diagnosis and spelling tutoring of children can play an important role in raising their ort...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملArabic Spelling Correction using Supervised Learning
In this work, we address the problem of spelling correction in the Arabic language utilizing the new corpus provided by QALB (Qatar Arabic Language Bank) project which is an annotated corpus of sentences with errors and their corrections. The corpus contains edit, add before, split, merge, add after, move and other error types. We are concerned with the first four error types as they contribute...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- TAL
دوره 53 شماره
صفحات -
تاریخ انتشار 2012